-
Notifications
You must be signed in to change notification settings - Fork 0
Feat/generic equipment upload #91
Merged
Merged
+1,532
−197
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Generalize the Upload section beyond the canonical etcher: - Add generic equipment_runs table (JSONB inputs/outputs) with RLS mirroring etcher_runs, plus runtime ensure migration and SQL scripts. - Route Process: etcher -> sync_runs_pg (unchanged); any other equipment -> sync_equipment_runs_pg with a generic CSV->rows builder driven by the equipment config. - Surface generic runs in /runs list and detail via a disjoint run_id offset so the existing Data Catalog renders them with no frontend rewrite. - Template generator now includes registered outputs/parameters, not just features/targets. - Fix the previously dead min/max validation (read features id/min, parameters min_value, targets.constraints) and make out-of-range a non-blocking warning. - Upload page wording is equipment-generic and shows range warnings on success. Co-authored-by: Cursor <cursoragent@cursor.com>
- Skip the template value-type descriptor row during ingestion (both etcher and generic builders) so a filled-in template processes cleanly. - Attribute generic runs to their own equipment_id and resolve the display name via equipment_metadata instead of the project's equipment. - Surface generic measured outputs in the catalog run-detail page and render generic input set-points (strings or numbers); add outputs to V2Run. - Count generic equipment_runs in equipment inventory run metrics. Co-authored-by: Cursor <cursoragent@cursor.com>
- Explicit-column templates now also include lot/timestamp keys and registered input parameters so no run metadata is silently dropped. - process_upload rejects linking a run to a project whose equipment differs from the upload's equipment (empty project equipment stays unconstrained). - run_to_jsonld serializes generic measured outputs and labels the dataset by its actual equipment instead of always "Etcher". Co-authored-by: Cursor <cursoragent@cursor.com>
- _coerce_optional_float rejects NaN/Infinity so generic ingestion stores the raw token as text instead of emitting invalid JSONB and 500-ing. - _fetch_equipment_run_metrics aggregates on the superuser connection so inventory run counts/last-data are complete fleet-wide rather than being hidden by RLS for private/shared projects. Co-authored-by: Cursor <cursoragent@cursor.com>
- Scope the catalog Download button explicitly to the etcher ML export (label + tooltip + filename) since that endpoint is the model-specific training matrix; generic runs remain browsable in the catalog. - Validate generic run timestamps and required lot/timestamp cells during processing so malformed dates surface as actionable validation errors instead of a database-sync 500; store normalized ISO timestamps. Co-authored-by: Cursor <cursoragent@cursor.com>
- Required upload columns are now the equipment's declared inputs (explicit columns, else features+parameters); outputs/targets and synthetic placeholder targets (primary_metric/secondary_metric) are never required, so valid CSVs for newly registered equipment are no longer rejected. - Generic ingestion coerces each cell by its registered type, preserving zero-padded string ids, integers, and booleans instead of floating all. - run_to_jsonld / _load_domain_config fall back to the equipment's PG config_json so FAIR metadata for PG-registered equipment carries units, ontology, hardware, and owner info. Co-authored-by: Cursor <cursoragent@cursor.com>
- equipment_runs gains a stable (upload_id, row_index) identity; reprocessing an upload now upserts in place (preserving each run's id/run_id and catalog URL) and prunes only rows dropped by a shorter re-upload. - _load_domain_config returns the file config only when present so the PG config_json fallback actually runs for database-only equipment. - _get_feature_ontology_map now indexes registered outputs so their units and QUDT metadata appear in FAIR JSON-LD. - Run-detail page renders boolean/string generic measurements explicitly instead of rendering nothing. Co-authored-by: Cursor <cursoragent@cursor.com>
- run_to_jsonld selects the etcher target fallback by equipment_id, so a non-etcher run with no outputs no longer emits null AvgEtchRate/RangeEtchRate properties and is correctly labeled. - Template descriptor-row detection now requires the complete hint row and is only applied to the first data row, so a sparse data row that happens to equal a type token is never silently dropped. Co-authored-by: Cursor <cursoragent@cursor.com>
_required_upload_columns now always requires the lot and timestamp metadata columns plus declared inputs (features + parameters), and excludes outputs even when a legacy config lists them as explicit columns. This stops anonymous/ undated catalog runs for new equipment and stops wrongly requiring output columns (or omitting appended parameters) for explicit-column configs. Co-authored-by: Cursor <cursoragent@cursor.com>
- upload_file validates registered equipment against _required_upload_columns always (even with explicit columns), so an inputs-only CSV is processable. - Generic ingestion reports an error for non-empty cells that cannot match their declared numeric/boolean type instead of silently storing bad data. - Template descriptor-row detection only compares hint columns present in the uploaded row, so a trimmed inputs-only template is still recognized. - Range warnings recorded at validation time are carried through processing (errors_json) and returned, keeping the processed-upload warning UI working. Co-authored-by: Cursor <cursoragent@cursor.com>
- upload_file validates canonical-etcher CSVs against the full alias-aware REQUIRED_SYNC_COLUMNS (including outputs) so an upload accepted at validation time can actually be processed by _csv_upload_to_sync_rows. - process_upload always returns errors as an array (empty when warning-free), restoring the API contract the upload page relies on (result.errors.length). Co-authored-by: Cursor <cursoragent@cursor.com>
_declared_type_violation now flags non-integral values (e.g. 1.5) for columns declared int/integer/long, so fractional data can't be stored in fields the equipment schema marks as integers. Co-authored-by: Cursor <cursoragent@cursor.com>
- Template value-type hints now honor declared parameter/output types (string/boolean/int), so a non-float field is no longer advertised as float. - etcher_runs always report equipment_id/name as the canonical etcher (even when the project has no equipment association), so FAIR JSON-LD keeps etch-rate results and etcher labeling instead of degrading to a generic run. Co-authored-by: Cursor <cursoragent@cursor.com>
…utput fixes
- etcher_runs are unconditionally identified/aggregated as the canonical etcher
(data_loader_pg base select and metadata_pg metrics), so a project associated
with another tool (or none) can't misattribute or drop canonical runs.
- Template hints prefer a non-float declared type over a unit, so boolean/int/
string fields advertise their real type while numeric fields keep unit hints.
- Generic ingestion omits None-valued input/output cells, so inputs-only or
partially-filled uploads don't store misleading {"result": null} measurements.
Co-authored-by: Cursor <cursoragent@cursor.com>
get_summary_stats_pg now aggregates etcher_runs together with equipment_runs for total/clean/outlier counts and the date range, so processing a non-etcher upload updates the dashboard's run statistics instead of leaving them unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
_get_feature_ontology_map now indexes registered input parameters (unit, QUDT, prov_direction: input), so generic equipment that declare inputs under parameters rather than features keep their units and provenance in FAIR JSON-LD. Co-authored-by: Cursor <cursoragent@cursor.com>
…etadata PG-registered equipment store manufacturer/model/serial_number/location at the config top level, but run_to_jsonld reads them from config["domain"]. _load_domain_config now normalizes the PG config so these hardware/identity fields are backfilled into domain, keeping them in generic-run FAIR JSON-LD. Co-authored-by: Cursor <cursoragent@cursor.com>
…jection - Merged etcher/equipment run pages sort on the actual instant (offset-aware datetimes normalized to UTC) instead of raw ISO strings, so runs with different UTC offsets paginate in true chronological order. - Generic ingestion rejects structurally malformed rows (extra cells under the DictReader None key, or short rows leaving columns at the None restval) so a directly-invoked process call can't turn a malformed CSV into a catalog run. Co-authored-by: Cursor <cursoragent@cursor.com>
…round-trip Add _coerce_optional_int and use it for int/integer/long columns in both _coerce_typed and _declared_type_violation, so 64-bit identifiers beyond 2**53 keep full precision instead of being silently rounded via float. Co-authored-by: Cursor <cursoragent@cursor.com>
…nfigs - _coerce_optional_int parses decimal-form integers (e.g. 9007199254740993.0) via Decimal instead of binary float, preserving 64-bit values; non-finite Decimals are rejected. - _load_domain_config normalizes file-backed/registered JSON configs too, so manufacturer/model/serial/location stored at the config top level appear in FAIR metadata for equipment that have a JSON snapshot, not just PG-only ones. Co-authored-by: Cursor <cursoragent@cursor.com>
…ions Add _effective_equipment_config to backfill a registered equipment's explicit column schema from its top-level columns (columns_json) when config_json is empty/incomplete. upload_file and _load_equipment_config both use it, so the upload schema validation and generic processing no longer accept or drop files that omit registered columns. Restores test_upload_reports_missing_required_columns. Co-authored-by: Cursor <cursoragent@cursor.com>
Merged run pagination now returns an empty page for limit=0 (matching the prior SQL LIMIT 0 semantics) instead of treating a falsy limit as unlimited-from- offset; limit=None remains unlimited and limit>0 returns the window. Co-authored-by: Cursor <cursoragent@cursor.com>
Generic ingestion now parses is_outlier, is_calibration_recipe, and outlier_type (via the standard column aliases) into the top-level row passed to sync_equipment_runs_pg, so uploaded generic runs keep their quality metadata instead of defaulting to false/empty and being misclassified in catalog filters, badges, and summary outlier counts. Co-authored-by: Cursor <cursoragent@cursor.com>
Sign in
to join this conversation on GitHub.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.